

### Research Journal of Pharmaceutical, Biological and Chemical Sciences

# Input driven dynamically reconfigurable fixed width multiplier for low power DSP applications.

N Rameshbabu\*, DRVA Sharath Kumar\*, and Finney Daniel N\*.

<sup>1</sup>Associate Professor CIET, Guntur, Andhra Pradesh, India.
 <sup>2</sup>Professor, SMEC, Hyderabad, Telangana, India.
 <sup>3</sup>Assistant Professor, CESD, Vijayawada, Andhra Pradesh, India.

### ABSTRACT

In this paper, we propose an approximate multiplication technique in order to meet the demands of digital system energy efficiency in digital signals processing (DSP) applications. In most cases fixed point arithmetic is used to keep the dynamic ranges within the certain limit based accuracy trade off over resource constraints. In this work by incorporating this fixed width constraint multiplication is carried out on m consecutive bits (i.e., m-bit segment) from each n-bit operand for energy efficiency. Here m-bit segment will be start from any bit positions depending on where the leading one bit is located for a positive number. In this approach dynamically inputs are selected that will provide much higher accuracy than one simply truncating the LSBs, because it can effectively capture more noteworthy bits. Here we proved the comparison tradeoff between computational accuracy and energy consumption at various input bit sizes. In this brief, we propose Vedic multiplier architectures to improve the speed rate over dynamic segmentation method. Compared with the array multiplier, the proposed multiplier will consume less energy with average computational error. Finally, we proved the performance accuracy through EDA tool FPGA synthesis and exhaustive test bench simulation results.

**Keywords**: DSP(Digital signal processing),DSM (Dynamic segment method), Truncation errors, Energy consumption etc.





### INTRODUCTION

In real time all digital systems processing signals are specified with optimal fixed-point computation required to maximize the accuracy and to save energy in order to satisfy power consumption constraints of embedded systems. Multiplication is a basic unit in digital signal processing (DSP) applications like FIR filters Fast Fourier transform etc [1-2].To analysis of the impact of the fixed-point DSP architecture on the computation accuracy various methodologies has been proposed previously [3-4].The truncation schemes proposed in [5-6] demonstrate graceful degradation of computations and described how the overall system accuracy is least affected with fixed width digital multiplications. In [7] multiplier proposed focused only on the low cost implementation aspect while ignoring the impact of truncation over sustainable DSP process error tolerance rate.

The benefits of bit truncation in fixed width arithmetic for reduced hardware cost and the freedom of selecting input bits for partial product generation which yields more accurate results have been largely unexplored. An array-structured architecture proposed in [8] achieves complexity reduction where truncation process utilizes significant and non significant column selection based on binary weight ages of partial products generated. And finally a simple deterministic constant bias element is incorporated to reduce the error cause due to fixed width constraints. The additive error model is described in many cases to overcome the impact of bit truncation introduced with fixed width multipliers as mentioned earlier in this chapter do not affect the error caused by bit quantization significantly and they can still be acceptable in practice.

### **OPERAND SEGMENTATION SCHEME:**

Any fixed width multiplication approaches, always has negative impact on computational accuracy because it eliminates LSB regions as redundant bits. To overcome redundant computations involved in LSB regions the operands go through a data conditioning stage which involves segmenting process. These adjusted operands then go through the arithmetic unit which performs multiplication process and then it is normalized and rounded before the output stage.

The novelty of the proposed work is to design hybrid segmenting scheme which will focus on energy efficiency and error tolerance jointly in multiplication process. The key difference in our proposed approach from all other previous works published [9-10]that explored various computational error tolerances is based on the concept of quality significance driven operand segmentation and divide and conquer driven Vedic computation without losing the generality of its nature. Our design technique focuses on the fact that all DSP computations are not equally quantized and determining the output quality over energy consumed.

The major contributions of this work are as follows.

- To design error tolerant fixed point multiplier using input driven significant enabled dynamic segment method to get less computational errors.
- To perform MUX based selection circuit to get variable selection of input operand range for trade off between quality Vs energy consumption rate.
- To improve the overall system performance with Vedic computation by incorporating divide and conquer approach which involves prediction of critical path to achieve maximum operating speed while simultaneously ensuring error free computation or graceful degradation of accuracy.

### **PROPOSED ARCHITECTURE**

In this section, we describe the methodology used for scaling the input operands without compromising computational accuracy and achieve its merits over energy efficiency. Here we consider input operand size as 16 bit and produce fixed width 16 bit output results with significant truncation error. Since the reduction of input operand size always leads comparable hardware complexity reduction as its full version, it is always cause severe performance loss compared to full width multiplier with direct truncation. It is shown that error level is a function of distribution of binary weight ages in the input operand. In any DSP block architecture the computational numeric inaccuracy caused in multiply accumulate (MAC) unit due to truncation will degrade the system performance. Here effectiveness of the proposed scheme is evaluated for finite impulse response (FIR) filter architecture.

2017

RJPBCS



### Weightage Distribution:

Here to limit the error caused during the estimation of starting bit positions to extract an m-bit segment from an n-bit operand in SSM method the binary values of input operands are evaluated. Unique weighting function is given for both the input operands and segmentation is carried out accordingly. Since in SSM methods, Regardless of m and n, one will get only four possible combinations of taking two m-bit segments from two n-bit operands for a multiplication.



Fig.1. Proposed ID-DSM multiplier architecture

### Decomposed Vedic scheme:

The performance, flexibility of core multiplier used strongly impact the characteristics of the entire DSP system. The value of a high-performance ID\_DSM is enhanced even more using Vedic algorithm which can be easily extended to system with any input size, since it allows divide and conquer approach , results in a massive reduction in critical path delay as well as component cost. Here by splitting up segmented operands into smaller operands of N/2 width to allow the computation of inner products, thus allowing increased flexibility results reconfigurable switching network.

### Post normalization unit:

The data propagation path model of proposed ID\_DSM can be repeated for different levels of granularity over error compensation required. This is of particular importance when multiplier is used in DSP applications. The weightage parameterized during precomputation to handle DSM of input operands also useful in solutions to segment the data at the output, thereby guaranteeing superior performance–hardware complexity tradeoffs. Here with parameterizable architecture allows expansion of operand size without any significant changes to the architecture, and system model.

### PERFORMANCE EVALUATION

Here in this work hardware modules are described using Verilog HDL and synthesized using ALTERA FPGA tool. Here we also present the proposed dynamic error-control scheme for digital filtering (FIR) in order to prove the impact of ID-DSM method in overall system performance. The proposed scheme as shown in Fig 2, the filter input and FIR coefficients are fed to weightage computation unit and DSM block which will act as

March – April 2017 RJPBCS 8(2) Page No. 924



an error-control block that decides the errors in the filter output and reduces their effect on system performance. The term error here represents a soft error during filter implementation not in the presence of other errors induced from process such as deep submicron noise.

## Table I Comparison of hardware and time complexities of the proposed ID\_DSM over conventional approaches:

| S.No | parameter                        | Direct truncation<br>method | SSM     | ID_DSM   |
|------|----------------------------------|-----------------------------|---------|----------|
| 1.   | Logic elements used              | 1044                        | 344     | 413      |
| 2.   | Logic registers used             | 544                         | 208     | 256      |
| 3.   | Number of transition<br>occurred | 197785                      | 49596   | 56513    |
| 4.   | Power(mW)                        | 121.11mW                    | 82.21mW | 104.48mW |
| 5.   | Computational Error              | 5~6%                        | 25-30%  | 5-6%     |

### Accuracy trade off analyzes:

To prove the efficiency of ID\_DSM architecture over conventional full width multiplier with direct truncation in terms of computational accuracy exhaustive test bench simulation is carried out using simulator tool. And finally FIR architectures included for comparison which was designed to work with both DSM and ID\_DSM modules as a part of MAC unit.

### **Complexity analyzes:**

However, a majority of works relies on an array environment even for ID\_DSM operability to prove its merits over conventional methods. In contrast to this, the proposed methods works well in array-based environments, but can also be integrated with Vedic approach for high performance. Another architecture included in the comparison was the conventional SSM method. As shown in Fig 2 and Fig 3 the proposed ID\_DSM offers significant energy and area efficiency with high throughout metrics.

### Hardware synthesis results comparison:

The proposed architecture is then compared for power, design complexity, performance in terms of operating frequency as well as power consumption. This power factor is beneficial since it proves the energy efficiency of ID\_DSM in DSP architecture as shown in Fig 4 and Fig 5. As shown in Table I energy conservation is achieved through considerable dynamic bit transition reduction. The actual computational error of proposed ID\_DSM is same as like conventional direct truncation method while SSM based approach is not applicable for any real time DSP applications due to tolerable computational error since there is no prior statistics of input values.

| S.No | parameter            | Array ID_DSM |            |
|------|----------------------|--------------|------------|
| 1.   | Logic elements used  | 413          | 266        |
| 2.   | Logic registers used | 256          | 159        |
| 3.   | Operating frequency  | 123.15 MHz   | 303.67 MHz |
| 4.   | Delay                | 8.12ns       | 3.29ns     |
| 5.   | Throughput           | 1.97Gbps     | 4.86Gbps   |

### Table II performance comparison of divide and conquer based Vedic in ID\_DSM method:

The results are presented in Table I and II. It can be observed that the proposed architecture offers almost 14% reduction in power and 2.5 times hardware complexity reduction while operating at 303.67MHz with a latency of three clock cycles. In terms of throughput efficiency, the proposed design shows up the orders of magnitude higher throughput/power/area efficiency compared to the other architectures.









### Throughput analysis(in Fmax)



| Flow Summary |                                                                                                                                                                                                                                                                                                                                                                             |                                                                                                                                                                                                                                                                                                       |  |  |
|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| Flov         | Flow Status<br>Quartus II Version<br>Revision Name<br>Top-level Entity Name<br>Family<br>Met timing requirements<br>Total logic elements<br>Total combinational functions<br>Dedicated logic registers<br>Total registers<br>Total registers<br>Total virtual pins<br>Total virtual pins<br>Total memory bits<br>Embedded Multiplier 9-bit elements<br>Total PLLs<br>Device | Successful - Tue Oct 18 13:18:35 2016<br>9.0 Build 132 02/25/2009 SJ Web Edition<br>TOP<br>MULTIPLIER<br>Cyclone III<br>N/A<br>413 / 5,136 (8 %)<br>381 / 5,136 (7 %)<br>256 / 5,136 (5 %)<br>256<br>67 / 183 (37 %)<br>0<br>0 / 423,936 (0 %)<br>0 / 46 (0 %)<br>0 / 2 (0 %)<br>EP3C5F256C6<br>D / 4 |  |  |
|              | Timing Models                                                                                                                                                                                                                                                                                                                                                               | Final                                                                                                                                                                                                                                                                                                 |  |  |

### Fig 4.Area utilization report

March - April

2017



| Image: Second                                | <b>.</b>                                          | _ & ×                                                                                                                                                                                                                                             |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Project Navigator   Project Navigator  Project Navigator  Compilation Report  Compilation  Co |                                                   | Successful - Tue Oct 18 14:26:57 2016<br>9.0 Build 132 02/25/2009 SJ Web Edition<br>TOP<br>MULTIPLIER<br>Cyclone III<br>EP3C5F256C6<br>Final<br>104.48 mW<br>22.25 mW<br>46.20 mW<br>36.03 mW<br>Low: user provided insufficient toggle rate data |
| <pre>X Type Message V Info: Number of transitions in simul Info: Created VCD File C:/k/RAJARAM Info: Command: quartus_simread_se Info: </pre>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | ation is 56513<br>BACK UP FILES/DANIEL ANDHRA/DSp | s_files=off TOP -c TOP                                                                                                                                                                                                                            |

Fig.5. Power and signal transition report

### CONCLUSION

In this work, we have analyzed the dynamic tradeoff accuracy and energy conservation of fixed width multiplier for DSP applications. Our proposed input driven ID\_DSM multiplier takes m bits of an n-bit operand either starting from the weighted driven MSB or ending at the LSB and apply two segments that includes the leading ones from two operands. Compared with an SSM approach that identifies only the MSB part leads computational error of 25-30% in most cases. An exact leading segmented positions of two operands and requires m-bit segments that consumes much less energy and area than any accuracy metrics direct truncation methods. Here we achieved 13% energy efficiency and 2.5 times area efficiency along with considerable throughput enhancement with decomposed Vedic based ID\_DSM. We also demonstrate that the loss of accuracy using ID\_DSM doesn't create notably impact on multiplier output, and we evaluated it through FIR filter implementation.

#### REFERENCES

- [1] Y. C. Lim, "Single-precision multiplier with reduced circuit complexity for signal processing applications," IEEE Trans. Comput., vol. 41, no. 10, pp. 1333–1336, Oct. 1992.
- [2] A. Chandrakasan and R. W. Brodersen, "Minimizing power consumption in digital CMOS circuits," Proceedings of the IEEE, vol. 83, no. 4, pp. 498-523, April 1995.
- [3] V. Gutnik and A. Chandrakasan, "Embedded power supply for low power DSP," IEEE Trans. on VLSI Syst., Vol. 5, pp. 425-435, Dec. 1997
- [4] N. Yoshida, E. Goto, and S. Ichikawa, "Pseudorandom rounding for truncated multipliers," IEEE Trans. Computers, vol. 40, no. 9, pp. 1065–1067, Sep. 1991.
- [5] J. M. Jou, S. R. Kuang, and R. D. Chen, "Design of low-error fixed width multipliers for DSP applications," IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 46, no. 6, pp. 836–842, Jun. 99
- [6] M. J. Schulte, J. E. Stine, and J. G. Jansen, "Reduced power dissipation through truncated multiplication," in Proc. IEEE Alessandro Volta Memorial Workshop Low-Power Des., Mar. 1999, pp. 61–69.
- [7] S.S. Kidambi, F. El-Guibaly, A. Antoniou, Area-efficient multipliers for digital signal processing applications, IEEE Transactions Circuits and Systems II: Analog and Digital Signal Processing 43 (2) (1996) 90–95
- [8] P. Kulkarni, P. Gupta, and M. Ercegovac, "Trading accuracy for power with an underdesigned multiplier architecture," in Proc. 24th IEEE Int. Conf. VLSI Design (VLSID), Jan. 2011, pp. 346–351.



- [9] V. K. Chippa, D. Mohapatra, A. Raghunathan, K. Roy, and S. T. Chakradhar, "Scalable effort hardware design: Exploiting algorithmic resilience for energy efficiency," in Proc. 47th IEEE/ACM Design Autom. Conf., Jun. 2010, pp. 555–560.
- [10] L-D. Van, C-C. Yang, Generalized low-error area-efficient fixed-width multipliers, IEEE Transactions Circuits and Systems I: Regular Paper 52 (8) (2005) 1608–1619.